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Abstract. We show that linear differential operators with polynomial 
coefficients over a field of characteristic zero can be multiplied in quasi- 
optimal time. This answers an open question raised by van der Hoeven. 



1. Introduction 

The product of polynomials and the product of matrices are two of the 
most basic operations in mathematics; the study of their computational 
complexity is central in computer science. In this paper, we will be inter- 
ested in the computational complexity of multiplying two linear differential 
operators. These algebraic objects encode linear differential equations, and 
form a non-commutative ring that shares many properties with the commu- 
tative ring of usual polynomials [21, 22]. The structural analogy between 
polynomials and linear differential equations was discovered long ago by 
Libri and Brassinne [18, 7, 13]. Yet, the algorithmic study of linear differen- 
tial operators is currently much less advanced than in the polynomial case: 
the complexity of multiplication has been addressed only recently [16, 6], 
but not completely solved. The aim of the present work is to make a step 
towards filling this gap, and to solve an open question raised in [16]. 

Let K be an effective field. That is, we assume data structures for repre- 
senting the elements of K and algorithms for performing the field operations. 
The aim of algebraic complexity theory is to study the cost of basic or more 
complex algebraic operations over K (such as the cost of computing the 
greatest common divisor of two polynomials of degrees less than d in K[x], 
or the cost of Gaussian elimination on an r x r matrix in ]K rxr ) in terms of 
the number of operations in K. The algebraic complexity usually does not 
coincide with the bit complexity, which also takes into account the poten- 
tial growth of the actual coefficients in K. Nevertheless, understanding the 
algebraic complexity usually constitutes a first useful step towards under- 
standing the bit complexity. Of course, in the special, very important case 
when the field IK is finite, both complexities coincide up to a constant factor. 
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The complexities of operations in the rings and W xr have been 

intensively studied during the last decades. It is well established that poly- 
nomial multiplication is a commutative complexity yardstick, while matrix 
multiplication is a non- commutative complexity yardstick, in the sense that 
the complexity of operations in K[x] (resp. in K rxr ) can generally be ex- 
pressed in terms of the cost of multiplication in K[x] (resp. in IK rxr ), and 
for most of them, in a quasi-linear way [2, 4, 8, 24, 14]. 

Therefore, understanding the algebraic complexity of multiplication in 
lC[x] and K. rxr is a fundamental question. It is well known that polyno- 
mials of degrees < d can be multiplied in time M(d) = 0{d\og d log log d) 
using algorithms based on the Fast Fourier Transform (FFT) [11, 25, 9], 
and two r x r matrices in K. rxr can be multiplied in time 0{r u ), with 
2 ^ w ^ 3 [27, 23, 12]. The current tightest upper bound, due to Vas- 
silevska Williams [28], is u < 2.3727, following work of Coppersmith and 
Winograd [12] and Stothers [26]. Finding the best upper bound on to is one 
of the most important open problems in algebraic complexity theory. 

In a similar vein, our thesis is that understanding the algebraic com- 
plexity of multiplication of linear differential operators is a very important 
question, since the complexity of more involved, higher-level operations on 
linear differential operators can be reduced to that of multiplication [17]. 

From now on, we will assume that the base field K has characteristic 
zero. Let K[x, d] denote the associative algebra K{x, d; dx = xd + 1) of 
linear differential operators in d = 4- with polynomial coefficients in x. 
Any element L of JK[x,<9] can be written as a finite sum '^ZiLi{x)d' 1 for 
uniquely determined polynomials Li in ~K[x]. We say that L has bidegree 
less than (d, r) in (x, d) if L has degree less than r in d, and if all Lj's have 
degrees less than d in x. The degree in d of L is usually called the order 
of L. 

The main difference with the commutative ring ~K[x, y] of usual bivariate 
polynomials is the commutation rule dx = xd + 1 that simply encodes, in 
operator notation, Leibniz's differentiation rule -^(xf) = x-^(f) + f. This 
slight difference between K[x, d) and K[x, y] has a considerable impact on 
the complexity level. On the one hand, it is classical that multiplication in 
K[x,y] can be reduced to that of polynomials in K[x], due to a technique 
commonly called Kronecker's trick [19, 14]. As a consequence, any two poly- 
nomials of degrees less than d in x, and less than r in y, can be multiplied 
in quasi-optimal time 0(M(dr)). On the other hand, under our hypothesis 
that K. has characteristic zero, it was shown by van der Hoeven [16] that the 
product of two elements from K[x, d] of bidegree less than {n, n) can be com- 
puted in time O(n^). Moreover, it has been proved in [6] that conversely, 
multiplication in K. nxn can be reduced to a constant number of multiplica- 
tions in lK[x, d], in bidegree less than (n,n). In other words, multiplying 
operators of well-balanced bidegree is computationally equivalent to matrix 
multiplication. 
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However, contrary to the commutative case, higher-level operations in 
~K[x, d], such as the least common left multiple (LCLM) and the greatest com- 
mon right divisor (GCRD), do not preserve well-balanced bidegrees [15, 5]. 
For instance, the LCLM of two operators of bidegrees less than (n, n) is of 
bidegree less than (2n(n + 1), 2n) = 0(n 2 ,n), and this bound is generically 
reached. This is a typical phenomenon: operators obtained from computa- 
tions in d] tend to have much larger degrees in x than in d. 

In the general case of operators with possibly unbalanced degrees d in x 
and r in <9, the naive algorithm has cost 0(d 2 r 2 min(<i, r)); a better algo- 
rithm, commonly attributed to Takayama, has complexity (D(dr min(d, r)). 
We refer to [6, §2] for a review of these algorithms. When r ^ d ^ r 4_a; , 
the best current upper bound for multiplication is 0(r u ~ 2 d 2 ) [16, 17]. It 
was asked by van der Hoeven [16, §6] whether this complexity could be low- 
ered to 0(r w ~ 1 d). Here, and hereafter, the soft-0 notation C>{) indicates 
that polylogarithmic factors in d and in r are neglected. The purpose of 
the present work is to provide a positive answer to this open question. Our 
main result is encapsulated in the following theorem: 

Theorem 1. Let IK be an effective field of characteristic zero. Operators in 
d] of bidegree less than (d, r) in (x, d) can be multiplied using 

<5(min(d, rf- 2 dr) 

operations in K. 

Actually, we will prove slightly more refined versions of this theorem (see 
Theorems 3 and 5 below), by making the hidden log-terms in the complexity 
explicit. 

In the important case d ^ r, our complexity bound reads 0(r u '~ 1 d). This 
is quasi-linear (thus quasi-optimal) with respect to d. Moreover, by the 
equivalence result from [6, §3], the exponent of r is also the best possible. 
Besides, under the (plausible, still conjectural) assumption that uj = 2, the 
complexity in Theorem 1 is almost linear with respect to the output size. 
For r = 1 we retrieve the fact that multiplication in K[x] in degree < d 
can be done in quasi-linear time 0(d); from this perspective, the result of 
Theorem 1 can be seen as a generalization of the fast multiplication for usual 
polynomials. 

In an expanded version [3] of this paper, we will show that analogues of 
Theorem 1 also hold for other types of skew polynomials. More precisely, 
we will prove similar complexity bounds when the skew indeterminate d : 
f(x) i— >• f'(x) is replaced by the Euler derivative 5 : f(x) i-> xf'(x), or a 
shift operator o c : f(x) h4 f(x + c), or a dilatation \q '■ f( x ) ^ f(l x )- Most 
of these other cases are treated by showing that rewritings such as 5 -o- xd 
or <t c o exp(c<9) can be performed efficiently. We will also prove complexity 
bounds for a few other interesting operations on skew polynomials. 

Main ideas. The fastest known algorithms for multiplication of usual poly- 
nomials in K[x] rely on an evaluation-interpolation strategy at special points 
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in the base field K [11, 25, 9]. This reduces polynomial multiplication to the 
"inner product" in K. We adapt this strategy to the case of linear differential 
operators in K[x, d}: the evaluation "points" are exponential polynomials of 
the form x n e ax on which differential operators act nicely. With this choice, 
the evaluation and interpolation of operators is encoded by Hermite eval- 
uation and interpolation for usual polynomials (generalizing the classical 
Lagrange interpolation), for which quasi-optimal algorithms exist. For op- 
erators of bidegree less than (d,r) in (x,d), with r ^ d, we use p = 0(r/d) 
evaluation points, and encode the inner multiplication step by p matrix 
multiplications in size d. All in all, this gives an FFT-type multiplication 
algorithm for differential operators of complexity (5(d a;_1 r). Finally, we re- 
duce the case r < d to the case r ^ d. To do this efficiently, we design a 
fast algorithm for the computation of the so-called reflection of a differential 
operator, a useful ring morphism that swaps the indeterminates x and d, 
and whose effect is exchanging degrees and orders. 

2. Preliminaries 

Recall that IK denotes an effective field of characteristic zero. Throughout 
the paper, Kfa;]^ will denote the set of polynomials of degree less than d 
with coefficients in the field IK, and ~K[x, d]^^ will denote the set of linear 
differential operators in IK[x, d] with degree less than r in d, and polynomial 
coefficients in 

The cost of our algorithms will be measured by the number of field oper- 
ations in IK they use. We recall that polynomials in can be multiplied 
within M(d) = Oid logd log log d) = 0(d) operations in K, using the FFT- 
based algorithms in [25, 9], and that u denotes a feasible exponent for matrix 
multiplication over IK, that is, a real constant 2 ^ co ^ 3 such that two r x r 
matrices with coefficients in IK can be multiplied in time C(r w ). Through- 
out this paper, we will make the classical assumption that M(d)/d is an 
increasing function in d. 

Most basic polynomial operations in K.[x]d (division, Taylor shift, ex- 
tended gcd, multipoint evaluation, interpolation, etc.) have cost 0(d) [2, 4, 
8, 24, 14] . Our algorithms will make a crucial use of the following result due 
to Chin [10], see also [20] for a formulation in terms of structured matrices. 

Theorem 2 (Fast Hermite evaluation-interpolation). Let K be an effective 
field of characteristic zero, let cq, . . . , Ck-i be k positive integers, and let d = 
J2 i Ci. Given k mutually distinct points cto, . . . , a^-i in IK and a polynomial 
P € IK[x]d, one can compute the vector of d values 

H = (P(o ),P'(qo),...,P( c °- 1 )(qo), , 

P(a fc _i), P'K_i), • • • , P (cfc - 1_1) (ce fe _i)) 

in 0(M(d) log k) = Oid) arithmetic operations in IK. Conversely, P is 
uniquely determined by 7i, and its coefficients can be recovered from % in 
0(M(d) log k) = 0(d) arithmetic operations in K. 
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3. The new algorithm in the case r ^ d 

3.1. Multiplication by evaluation and interpolation. Most fast al- 
gorithms for multiplying two polynomials P, Q £ K[ir]<2 are based on the 
evaluation-interpolation strategy. The idea is to pick 2d — 1 distinct points 
ao, ■ ■ ■ , «2d-2 in K, and to perform the following three steps: 

(1) (Evaluation) Evaluate P and Q at ao, . . . , «2d-2- 

(2) (Inner multiplication) Compute the values (PQ)(ai) = P(cti)Q(ai) 
for i < 2d — 1. 

(3) (Interpolation) Recover PQ from (PQ)(ao), . . . , (PQ) (02^-2) • 
The inner multiplication step requires only 0(d) operations. Conse- 
quently, if both the evaluation and interpolation steps can be performed 
fast, then we obtain a fast algorithm for multiplying P and Q. For instance, 
if K contains a 2 p -th primitive root of unity with 2 P_1 ^ 2d — 1 < 2 P , then 
both evaluation and interpolation can be performed in time 0{d\ogd) using 
the Fast Fourier Transform [11]. 

For a linear differential operator L E M,[x,d]d t r it is natural to consider 
evaluations at powers of x instead of roots of unity. It is also natural to 
represent the evaluation of L at a suitable number of such powers by a 
matrix. More precisely, given k £ N, we may regard L as an operator from 
IfC[x]fc to We may also regard elements of IfC[x]^. and IK[:z;]fc + d as 

column vectors, written in the canonical bases with powers of x. We will 
denote by 

/ L(l) L{x k - l )v \ 

^k+d,k = 

V L(l) k+d ^ ••• L(x k ~ 1 )k+d-i J 

the matrix of the IK-linear map L : ~K[x]f; — >■ IK^fc+d with respect to these 
bases. Given two operators K,L in K.[x, <9]d jr , we clearly have 

For k = 2r (or larger), the operator KL can be recovered from the matrix 
<3?^J M ' 2r , whence the formula 

/, \ fh 2r+2d,2r fh 2r+2d,2r+d ^2r+d,2r 

\ L ) KL — L 

yields a way to multiply K and L. For the complexity analysis, we thus 
have to consider the three steps: 

(1) (Evaluation) Computation of <j>^+ M ' 2r + d anc j G f ^ 2 ^ +d ^ 2r f r0 m K 
and L. 

(2) (Inner multiplication) Computation of the matrix product (1). 

(3) (Interpolation) Recovery of KL from & 2 ^ 2d ' 2r . 

In [16, 6], this multiplication method was applied with success to the case 
when d = r. In this "square case", the following result was proved in [6, 
§4.2]. 
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Lemma 1. Let L £ K[x, d\d,d- Then 

(1) We may compute $> 2d,d as a function of L in time 0(dM(d)); 

(2) We may recover L from & 2d ' d in time 0(dM(d)). 

3.2. Evaluation interpolation at exponential polynomials. Assume 
now that r ^ d. Then a straightforward application of the above evaluation- 
interpolation strategy yields an algorithm of sub-optimal complexity. In- 
deed, the matrix & 2 £~^ 2d ' 2r contains a lot of redundant information and, 
since its mere total number of elements exceeds r 2 , one cannot expect a 
direct multiplication algorithm of quasi-optimal complexity 0(d w ~ 1 r). 

In order to maintain quasi-optimal complexity in this case as well, the 
idea is to evaluate at so called exponential polynomials instead of ordinary 
polynomials. More specifically, given L £ K[x, d)d r and a £ K, we will use 
the fact that L also operates nicely on the vector space K[x]e ax . Moreover, 
for any P € K[x], we have 

L(Pe ax ) = L Xa (P)e ax , 

where 

L* a = ^L;(x)(d + a)* 

i 

is the operator obtained by substituting d + a for d in L = J2i Li{x)d l . 
Indeed, this is a consequence of the fact that, by Leibniz's rule: 

d\Pe ax ) = ^jcJd^p] e ax = (d + aY(P)e ax . 

Now let p = \r/d~\ and let ao, . . . , a v -\ be p pairwise distinct points in K. 
For each integer k ^ 1, we define the vector space 

Y k = K[x] k e a ° x © • • • © K[x] k e a v- ix 
with canonical basis 

(e a ° x x k ~ 1 e a ° x e a p-i x x k ~ 1 e° p ~ lX ) 

Then we may regard L as an operator from Y k into Y k +d and we will denote 
by $^ +<i ' fc ] the matrix of this operator with respect to the canonical bases. 
By what precedes, this matrix is block diagonal, with p blocks of size d: 

I q,k+d,k V 



^[k+d,k] 

L 



Mk 



Let us now show that the operator L is uniquely determined by the matrix 
3> [ l d ' d \ and that this gives rise to an efficient algorithm for multiplying two 
operators in K[x, d\d,r- 

Lemma 2. Let L £ d]d t r with r ^ d. Then 
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(1) We may compute as a function of L in time 0(d M(r) logr); 

(2) We may recover L from the matrix <J>^ rf '^ in time 0(d M(r) log r). 

Proof. For any operator L = J2i<d,j<r Li^d 3 in d]d, r , we define its 
truncation L* at order 0(<9 d ) by 

L* = L iJ xi d j - 

i,j<d 

Since L — L* vanishes on K[x]d, we notice that (& 2d ' d = & 2d *' d . 

If L G K[<9] r , then L* can be regarded as the power series expansion of L 
at d = and order d More generally, for any i E {0, . . . , p — 1}, the operator 
L* Ka .(d) = L{d + a.i)* coincides with the Taylor series expansion at d = a{ 
and order d: 

L* Kai (d) = L(oi) + L'(oi)d +• • ■ + TS ^yL { - d - l \a l )d d - 1 . 

In other words, the computation of the truncated operators L* KaQ , . . . , L* afi 
as a function of L corresponds to a Hermite evaluation at the points «j, with 
multiplicity a = d at each point cti. By Theorem 2, this computation can be 
performed in time 0(M(pd)\ogp) = 0(M(r) logr). Furthermore, Hermite 
interpolation allows us to recover L from L* KaQ , . . . , L* _ 1 with the same 
time complexity C(M(r) logr). 

Now let L £ IK [a;, <9]d,r and consider the expansion of L in x 

L(x,d) = L (d) + --- + x d - 1 L d _ 1 (d). 

For each i, one Hermite evaluation of Li allows us to compute the L* a . i with 
j < p in time C(M(r) logr). The operators L^a- with j < p can therefore 
be computed in time 0(d M(r) logr). By Lemma 1, we need 0(rM(d)) = 
0(dM(r)) additional operations in order to obtain ^^ d ' d \ Similarly, given 
^[2d,d]^ L emma 2 allows us to recover the operators L* Ka . with j < p in time 
0(d M(r)). Using d Hermite interpolations, we also recover the coefficients 
Li of L in time 0{d M(r) log r). □ 

Theorem 3. Assume r ^ d and let K, L 6 9]d jr . T/ien £/ie product KL 
can be computed in time 0(d UJ ~ 1 r + d M(r) log r). 

Proof. Considering K and L as operators in ~K[x, d]sd,3r, Lemma 2 implies 
that the computation of $^' 3 ^ and $|^ ,2d l as a function of X and L can 
be done in time 0(dM(r) logr). The multiplication 

$ [4d,2d] = $ [4d,3d] $ [3d,2d] 
K L K L 

can be done in time 0{d ul p) = 0{d ul ~ l r). Lemma 2 finally implies that we 
may recover KL from ^j^ 2 ^ in time 0{d M(r) logr). □ 
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4. The new algorithm in the case d > r 



Any differential operator L £ K[sc, <9]d,r can be written in a unique form 



This representation, with x on the left and d on the right, is called the 
canonical form of L. 

Let <p : K[x, d] — > K[x, d] denote the map defined by 



In other words, ip is the unique IfC-algebra automorphism of K[x, d] that 
keeps the elements of K fixed, and is defined on the generators of K[x, d] 
by p(x) = d and p(d) = —x. We will call ip the reflection morphism of 
K[x, c?]. The map tp enjoys the nice property that it sends K[x,d]d, r onto 
^[x, d} r; d- In particular, to an operator whose degree is higher than its order, 
(p associates a "mirror operator" whose order is higher than its degree. 

4.1. Main idea of the algorithm in the case d > r. If d > r, then the 
reflection morphism if is the key to our fast multiplication algorithm for 
operators in K[x, d]d, r , since it allows us to reduce this case to the previous 
case when r ^ d. More precisely, given K, L in WL[x, d]d, r with d > r, the 
main steps of the algorithm are: 

(51) compute the canonical forms of (f(K) and <p(L), 

(52) compute the product M = tp(K)ip{L) of operators ^(K) € !K[a;,3] r ^ 
and <p(L) G lK[x, d] r ^, using the algorithm described in the previous 
section, and 

(53) return the (canonical form of the) operator KL = (p~ 1 (M). 

Since d > r, step (S2) can be performed in complexity 0(r ul ~ 1 d) using 
the results of Section 3. In the next subsection, we will prove that both 
steps (SI) and (S3) can be performed in (D(rd) operations in K. This will 
enable us to conclude the proof of Theorem 1. 

4.2. Quasi-optimal computation of reflections. We now show that the 
reflection and the inverse reflection of a differential operator can be com- 
puted quasi-optimally. The idea is that performing reflections can be inter- 
preted in terms of Taylor shifts for polynomials, which can be computed in 
quasi- linear time using the algorithm from [1]. 

A first observation is that the composition ip o (p is equal to the involution 



some scalars Li j G K. 



i<r,j<d 




ip : K[x, d] — > K[x, d] defined by 
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As a direct consequence of this fact, it follows that the map y? -1 is equal 
to (p o ij). Since ip{L) is already in canonical form, computing ip(L) only 
consists of sign changes, which can be done in linear time 0{dr). Therefore, 
computing the inverse reflection ip~ l (L) can be performed within the same 
cost as computing the direct reflection <p(L), up to a linear overhead 0(rd). 

In the remainder of this section, we focus on the fast computation of 
direct reflections. The key observation is encapsulated in the next lemma. 
Here, and in what follows, we use the convention that the entries of a matrix 
corresponding to indices beyond the matrix sizes are all zero. 

Lemma 3. Assume that (pij) and (qij) are two matrices in K rxd such that 



where we use the convention that pij = as soon as i ^ r or j ^ d. 
Proof. Leibniz's differentiation rule implies the commutation rule 



ij ij 



Then 





Together with the hypothesis, this implies the equality 



5>flw)-jj-* =Y,Wpij)& 



ij ij 




We conclude by extraction of coefficients. 



□ 



Theorem 4. Let L € K[x,d]d r - Then we may compute tp(L) and (p 1 (L) 
using 0(min(<i M(r), r M(d))) = 0(rd) operations in K. 



Proof. We first deal with the case r ^ d. If L = J2i<r,j<d Pij xJ 9 l , then by 
the first equality of Lemma 3, the reflection <f(L) is equal to 



<p{L)= Yl Pi E </<,/( -n J < r 




where 
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For any fixed k with 1 — r ^ k ^ d— 1, let us introduce G k = Y^i ^-(li,i+kX t+k 
and F k = J2i^-Pi,i+kX i+k ■ These polynomials belong to K[x]^, since pij = 
qij = for j d. If k ^ 0, then Equation (2) translates into 



Indeed, Equation (2) with j = i + k implies that G k (x) is equal to 



Similarly, if k > 0, then the coefficients of x l in Gk(x) and -Ffc(x + 1) still 
coincide for all i ^ k. In particular, we may compute Gi_ r , . . . , G^-i from 
-Fi_ r , . . . , -Frf-i by means of d + r ^ 2r Taylor shifts of polynomials in 
Using the fast algorithm for Taylor shift in [1], this can be done in time 
0(rM(d)). 

Once the coefficients of the G k s are available, the computation of the 
coefficients of (p(L) requires 0(dr) additional operations. 

If d > r, then we notice that the equality (2) is equivalent to 



as can be seen by expanding the binomial coefficients. Redefining G k := 
J2i i*Qi+k,iX t+k and F k := J2i^-Pi+k,i xt+k , similar arguments as above show 
that tp(P) can be computed using 0(d M(r)) operations in K. 

By what has been said at the beginning of this section, we finally conclude 
that the inverse reflection (p~ l (L) = (p(ip(L)) can be computed for the same 
cost as the direct reflection tp(L). □ 

4.3. Proof of Theorem 1 in the case d > r. We will prove a slightly 
better result: 

Theorem 5. Assume d > r and K,L £ <9]d, r - Then the product KL 
can be computed using 0(r ul ~ 1 d + rM(d) log d) operations in K. 

Proof. Assume that K and L are two operators in K[x,9]^. r with d > r. 
Then <f(K) and (f(L) belong to M.[x, d] r ^, and their canonical forms can be 
computed in 0(dM(r)) operations by Theorem 4. Using the algorithm from 
Section 3, we may compute M = (p(K)ip(L) in 0(r w ~ x d + rM(d) log d) op- 
erations. Finally, KL = </? _1 (M) can be computed in 0(r M(d)) operations 
by Theorem 4. We conclude by adding up the costs of these three steps. □ 



G k (x) 



F k (x + 1). 
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